I am a bit new to VB.NET. I have a page that sets 2 session variables and does a redirect to second page. The second pages is at least using one of the session variables. I can tell because on the second page, if the session variable is not correct the user is redirected to an access denied page. The second page also uses the session variable in question. It will read it an fill a gridview based on the value of the variable. I set the variable like so
Session("ID") = Convert.ToInt32(a_value)
and on the second page I retrieve the variable like this
a_page_variable = Session("ID")
What I find strange is that when I run this code in visual studio it works as expected but when I deploy and run it, I get 0 from my session variable instead of the true value of "a_value". I have tried a few things like making sure the data types match up from page to page and trying different ways to retrieve the variable such as
Session("userID")
and
CType(Session.Item("userID"), Int32)
I've also tried to see what is coming in to the second page by using
Response.Write
I also tried to use SQL Profiler to see what kind of call is being made to fill the gridview but I haven't had any luck. The gridview gives me an empty dataset and the Profiler does not detect a call being made from the application. I thought working with session variables was pretty straight forward but obviously, I am missing something.
Thanks for your help,
Billy
One possibility (and the only one that could be guessed at with how little information we have) could be the response.redirect causing the application to terminate due to an exception.
When redirecting, you want to always pass a false, and then call complete request.
Response.Redirect(urlstring, False)
Response.CompleteRequest()
not following these steps can cause exceptions, which may drop session.
Additionally, resolve virtual paths, as some browsers (mobile especially) can see those redirects as new requests entirely, thus generating new session tokens.
Dim urlstring As String
urlstring = Page.ResolveUrl("~/default.aspx")
that said, there are a number of possible causes for this situation.
Application Pool restarts
App Domain restarted
Code changing value unexpectedly
AV tinkering with files
deployed to web farm
With the description provided above, we just don't have enough information to really troubleshoot.
Thank you ADyson, Stephen Wrighton and everyone else who took a stab at helping me figure this out. I was able to find out what was going on by adding code that wrote to a log file on the server. Found the logging code here. I found that I never reached the code that set the session variable and that is the reason it never populated on the second page. I was trying to get the logon name for the user by using Environment.UserName which will return the user name of the person who is currently logged on to the operating system. But what I really wanted to do was get the logon name of the user that was visiting my site. For this I used User.Identity.Name. This works great when you need to know which user from an Active Directory domain is visiting your site.
I am a bit of a ASP.NET/VB noobie so sorry in advance for any stupid questions.
I am currently creating a ASP.NET application using VB. The application I have had is connected to a local Database and I am able redirect a user to a "Users" page once they have input their login credentials correctly.
What I want to happen is that once they login and are redirected to the "Users" page, there will be a message saying "Welcome" followed by their login name.
I have been able to successfully do this in C# but need to be able to do so in VB as well.
The code that worked in C# is as follows;
**Session["New"] = TextBoxUserName.Text;
Response.Write("Password is correct");
Response.Redirect("Users.aspx");**
I've tried putting this code straight into my VB project but have received the following error;
"Property access must assign to the property or use its value"
I've done some research into this error and can't seem to find any solution to the problem.
Is my syntax incorrect or are sessions used completely different in VB?
I haven't seen your VB Code but it should be:
Session("New") = TextBoxUserName.Text
''Response.Write("Password is correct"); - No point in having this line as the redirect overrides it quickly.
Response.Redirect("Users.aspx")
Note the use of round brackets rather than square brackets for VB, also note the lackof semi colons. VB.NET and C# are completely different syntax wise.
I have an aspx page, which is a "User Log-In" area. I want to pass the userid to another page which is linked from the aspx page.
the link I have looks something like this:
www.abcdefg.com/Home/Redirect/?authtkn=123456abcd=xxxx
I need the xxxx to be a session variable which in this case is userid.
**userid is not sensative information, this is simply to redirect the user to another page for specified information.
Any thoughts on how to pass a session variable to a URL, or if this can be done. The example www.abcdefg.com is a different domain (on a different server) from the original aspx page.
Why not appending like this?
string.Format("www.abcdefg.com/Home/Redirect/?authtkn=123456abcd={0}",
Session["UserId"]);
if i understood you correctly.
Think your question is how to maintain cross domain session or authentication.
Check this link Maintaining Session State Across Domains, may give you some idea
Or this one How can I share a session across multiple subdomains?
You don’t need to pass a session variable via the url. You can just start session on the next page and have access to all your session variables! If you do want the variables up there so you can quickly get to different userids with just a quick url change, send userid to the url on your previous page with with $_GET[ ].
I am kind of new to ASP.NET.
I wonder is there any way to restrict user can only enter from a specify page?
Like, I have a Page A to let them enter some information, then when submit, I will use Response.Redirect to Page B. But I don't want the user can go into Page B directly from URL....
If I use Session, then if the user didn't close the browser to end the session, the another user can just go into Page B directly...
Because the computer that access to these pages is using by the public, so I want to see if there is anyway to make sure they only do one way process? Can't go back to previous or jump to another page.
Thanks in Advance.
If you control the other page, start a session and set a session variable to a value that can be reversed that only your server could (or should) create, much like serial keys. For example 72150166 because the sum of every second number equals the sum of every other number (7 + 1 + 0 + 6 = 2 + 5 + 1 + 6) but you could choose an algorithm as complex or as simple as you want. When the user navigates to the second page, check the session variable. This is not invincible security, but it is better than checking the referrer (especially since some browsers do not set it) and I imagine security based on coming from a certain page doesn't have to be that strict.
Edit: You should also add it to a database and link it with the particular user's IP so someone else can't use the same key.
You can use Request.UrlReferrer property in the Page Load of PageB to see which page is the request coming from. If the request is not coming from PageA then redirect the user to PageA.
Check this link for more information: http://msdn.microsoft.com/en-us/library/system.web.httprequest.urlreferrer.aspx
Note: UrlReferrer is dependent on a request header and someone can set the header to mimic the request coming from PageA.
You could have the page that redirects send some sort of specifically generated hash/key in the query string that expires quickly and/or once viewed. This should be a lot more solid on the security side.
You will still need some way to store this key or value producing the hash so you can validate it on the receiving end- I would think your DB.
I need to scrape query results from an .aspx web page.
http://legistar.council.nyc.gov/Legislation.aspx
The url is static, so how do I submit a query to this page and get the results? Assume we need to select "all years" and "all types" from the respective dropdown menus.
Somebody out there must know how to do this.
As an overview, you will need to perform four main tasks:
to submit request(s) to the web site,
to retrieve the response(s) from the site
to parse these responses
to have some logic to iterate in the tasks above, with parameters associated with the navigation (to "next" pages in the results list)
The http request and response handling is done with methods and classes from Python's standard library's urllib and urllib2. The parsing of the html pages can be done with Python's standard library's HTMLParser or with other modules such as Beautiful Soup
The following snippet demonstrates the requesting and receiving of a search at the site indicated in the question. This site is ASP-driven and as a result we need to ensure that we send several form fields, some of them with 'horrible' values as these are used by the ASP logic to maintain state and to authenticate the request to some extent. Indeed submitting. The requests have to be sent with the http POST method as this is what is expected from this ASP application. The main difficulty is with identifying the form field and associated values which ASP expects (getting pages with Python is the easy part).
This code is functional, or more precisely, was functional, until I removed most of the VSTATE value, and possibly introduced a typo or two by adding comments.
import urllib
import urllib2
uri = 'http://legistar.council.nyc.gov/Legislation.aspx'
#the http headers are useful to simulate a particular browser (some sites deny
#access to non-browsers (bots, etc.)
#also needed to pass the content type.
headers = {
'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
'Content-Type': 'application/x-www-form-urlencoded'
}
# we group the form fields and their values in a list (any
# iterable, actually) of name-value tuples. This helps
# with clarity and also makes it easy to later encoding of them.
formFields = (
# the viewstate is actualy 800+ characters in length! I truncated it
# for this sample code. It can be lifted from the first page
# obtained from the site. It may be ok to hardcode this value, or
# it may have to be refreshed each time / each day, by essentially
# running an extra page request and parse, for this specific value.
(r'__VSTATE', r'7TzretNIlrZiKb7EOB3AQE ... ...2qd6g5xD8CGXm5EftXtNPt+H8B'),
# following are more of these ASP form fields
(r'__VIEWSTATE', r''),
(r'__EVENTVALIDATION', r'/wEWDwL+raDpAgKnpt8nAs3q+pQOAs3q/pQOAs3qgpUOAs3qhpUOAoPE36ANAve684YCAoOs79EIAoOs89EIAoOs99EIAoOs39EIAoOs49EIAoOs09EIAoSs99EI6IQ74SEV9n4XbtWm1rEbB6Ic3/M='),
(r'ctl00_RadScriptManager1_HiddenField', ''),
(r'ctl00_tabTop_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_menuMain_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_gridMain_ClientState', ''),
#but then we come to fields of interest: the search
#criteria the collections to search from etc.
# Check boxes
(r'ctl00$ContentPlaceHolder1$chkOptions$0', 'on'), # file number
(r'ctl00$ContentPlaceHolder1$chkOptions$1', 'on'), # Legislative text
(r'ctl00$ContentPlaceHolder1$chkOptions$2', 'on'), # attachement
# etc. (not all listed)
(r'ctl00$ContentPlaceHolder1$txtSearch', 'york'), # Search text
(r'ctl00$ContentPlaceHolder1$lstYears', 'All Years'), # Years to include
(r'ctl00$ContentPlaceHolder1$lstTypeBasic', 'All Types'), #types to include
(r'ctl00$ContentPlaceHolder1$btnSearch', 'Search Legislation') # Search button itself
)
# these have to be encoded
encodedFields = urllib.urlencode(formFields)
req = urllib2.Request(uri, encodedFields, headers)
f= urllib2.urlopen(req) #that's the actual call to the http site.
# *** here would normally be the in-memory parsing of f
# contents, but instead I store this to file
# this is useful during design, allowing to have a
# sample of what is to be parsed in a text editor, for analysis.
try:
fout = open('tmp.htm', 'w')
except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
That's about it for the getting of the initial page. As said above, then one would need to parse the page, i.e. find the parts of interest and gather them as appropriate, and store them to file/database/whereever. This job can be done in very many ways: using html parsers, or XSLT type of technogies (indeed after parsing the html to xml), or even for crude jobs, simple regular-expression. Also, one of the items one typically extracts is the "next info", i.e. a link of sorts, that can be used in a new request to the server to get subsequent pages.
This should give you a rough flavor of what "long hand" html scraping is about. There are many other approaches to this, such as dedicated utilties, scripts in Mozilla's (FireFox) GreaseMonkey plug-in, XSLT...
Most ASP.NET sites (the one you referenced included) will actually post their queries back to themselves using the HTTP POST verb, not the GET verb. That is why the URL is not changing as you noted.
What you will need to do is look at the generated HTML and capture all their form values. Be sure to capture all the form values, as some of them are used to page validation and without them your POST request will be denied.
Other than the validation, an ASPX page in regards to scraping and posting is no different than other web technologies.
Selenium is a great tool to use for this kind of task. You can specify the form values that you want to enter and retrieve the html of the response page as a string in a couple of lines of python code.
Using Selenium you might not have to do the manual work of simulating a valid post request and all of its hidden variables, as I found out after much trial and error.
The code in the other answers was useful; I never would have been able to write my crawler without it.
One problem I did come across was cookies. The site I was crawling was using cookies to log session id/security stuff, so I had to add code to get my crawler to work:
Add this import:
import cookielib
Init the cookie stuff:
COOKIEFILE = 'cookies.lwp' # the path and filename that you want to use to save your cookies in
cj = cookielib.LWPCookieJar() # This is a subclass of FileCookieJar that has useful load and save methods
Install CookieJar so that it is used as the default CookieProcessor in the default opener handler:
cj.load(COOKIEFILE)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
To see what cookies the site is using:
print 'These are the cookies we have received so far :'
for index, cookie in enumerate(cj):
print index, ' : ', cookie
This saves the cookies:
cj.save(COOKIEFILE) # save the cookies
"Assume we need to select "all years" and "all types" from the respective dropdown menus."
What do these options do to the URL that is ultimately submitted.
After all, it amounts to an HTTP request sent via urllib2.
Do know how to do '"all years" and "all types" from the respective dropdown menus' you do the following.
Select '"all years" and "all types" from the respective dropdown menus'
Note the URL which is actually submitted.
Use this URL in urllib2.